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Abstract. Wc develop the idea of using Monte Carlo sampling of random portfolios to solve 
portfolio investment problems. In this first paper wc explore the need for more general optimization 
tools, and consider the means by which constrained random portfolios may be generated. A practical 
scheme for the long-only fully-invested problem is developed and tested for the classic QP application. 
The advantage of Monte Carlo methods is that they may be extended to risk functions that are more 
complicated functions of the return distribution, and that the underlying return distribution may 
be computed without the classical Gaussian limitations. The optimization of quadratic risk-return 
functions, VaR, CVaR, may be handled in a similar manner to variability ratios such as Sortino 
and Omega, or mathematical constructions such as expected utility and its behavioural finance 
extensions. Robustification is also possible. Grid computing technology is an excellent platform for 
the development of such computations due to the intrinsically parallel nature of the computation, 
coupled to the requirement to transmit only small packets of data over the grid. We give some 
examples deploying GridMathematica, in which various investor risk preferences are optimized with 
differing multivariate distributions. Good comparisons with established results in Mean- Variance 
and CVaR optimization are obtained when "edge-vertex-biased" sampling methods are employed to 
create random portfolios. We also give an application to Omega optimization. 
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You can't be serious, man. You can not be serious! John McEnroe, 1981. 

1. Introduction. The management of financial portfolios by the methods of 
quantitative risk management requires the solution of a complicated optimization 
problem, in which one decides how much of a given investment to allocate to each of 
N assets - this is the problem of assigning weights. There are many different forms 
of the problem, depending on how large N is (2 — 2000+!), whether short-selling is 
allowed, investor preferences, and so on. In the simplest form of the problem, which 
is still relevant to traditional fund management, the weights are non- negative (you 
cannot go "short") and are normalized to add to unity. So if we have an amount 
of cash P to invest and N assets to choose from, the weights are Wi,i = 1,...,N 
satisfying 

Wl >0 (1.1) 



JV 

5>i = l (1.2) 

»=1 

This long-only form of the problem is of considerable importance, but not the only 
one of interest. Additional constraints are often specified, and this will be allowed for 
in the following discussion. The development of optimized Monte-Carlo tools with 
the addition of short-selling is under investigation. 

The simplest and very ancient solution to this problem is to take Wi — 1/N, which 
is equal allocation, and has been understood for hundreds of years as a principle. The 
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amount of cash invested in asset i is then, in general, Pwi and in the equal-allocation 
strategy it is P/N. The oldest currently known statement of this principle is due to 
one Rabbi Isaac bar Aha, who in an Aramaic text from the 4th Century, stated his 
equal allocation strategy as: A third in land, A third in merchandise, A third in cash 
("a third at hand"). The usefulness of this robust strategy continues to be explored - 
see for example the beautiful study by DeMiguel et al [12] and references therein. 

It has become an in-joke to state that the underlying mathematical theory did not 
move on very quickly after the Rabbi. There were sporadic references in literature. 
Some proverbial mis-translation of Don Quixote is sometimes given as: Tis the part 
of wise man to keep himself today for tomorrow, and not venture all his eggs in one 
basket, but there is in fact no evidence in the original text of eggs. Shakespeare 
illustrated the concept of diversification around the same time in The Merchant of 
Venice, when Antonio told his friends he wasn't spending sleepless nights worrying 
over his commodity investment: Believe me, no. I thank my fortune for it, My 
ventures are not in one bottom trusted, Nor to one place; nor is my whole estate upon 
the fortune of this present year. Therefore my merchandise makes me not sad. It 
helps to know that "bottom" refers to a ship's hull, and that the diversification here 
is intended to prevent the entire cargo going down with a single ship. But note that 
Shakespeare had quietly added temporal diversification. 

There is also a contrarian thread of thoughlQ some people follow Mark Twain's 
contrarian, anti-portfolio investment strategy (from Pudd'nhead Wilson) and seek to 
concentrate on the next big thing: "Behold, the fool saith, 'Put not all thine eggs in the 
one basket' - which is but a manner of saying, 'Scatter your money and your attention', 
but the wise man saith, 'Put all your eggs in the one basket and - watch that basket'." 
A similar quote is also attributed to Andrew Carnegie, and it is often claimed that 
Warren Buffett leans towards a similar philosophy, based on several observations and 
his infamous quote: Concentrate your investments. If you have a harem of 40 women 
you never get to know any of them very well. In the aftermath of the financial crisis, 
the following anti-diversification theme seems even more pertinent]^] If you are stuck 
on a desert island and need a drink of water, and know that one of the three streams 
is poisoned, but not which one, would you take a third of your need from each stream? 
Such debates will continue. 

Putting aide the cultural discussion, the mathematical theory moved on in the 
post- WWII work of Markowitz and collaborators, (see for example [131 031 H] ) ■ My 
own understanding is that Markowitz was in fact interested in the control of downside 
risk (the semivariance), but given the technology then available he used variance as 
a proxy for risk, where the idea was formulated as (in the simplest version), the 
minimization of risk subject to a return goal. This resulted in a problem to minimize 
functions of the form 

N N N 

wjWjCjj - a RjWj (1.3) 

i=l j=l i=l 

where Cyis the covariance matrix of the assets and Ri are the expected returns. With 
the summation convention due to Einstein, this is usually written in either of the more 
compact forms 

WiWjCij — XRiWi — w T .C.w — XR.w (1-4) 

and the minimization is subject to the constraints of equations (1,2) and indeed any 
other additional conditions (e.g. sector exposure, upper and lower bounds etc.) The 

1 My thanks to Michael Mainelli for making me aware of this. 

2 1 apologise for not being able to quote the source of this idea - seen once on the web and not 
found again - correction requested. 
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quantity A is a Lagrange multiplier expressing an investor's risk-return preference, 
and there are other ways of writing down the problem. 

This form of the problem, known as constrained Quadratic Programming (QP), is 
highly restrictive. First there is the underlying idea that the risk is characterized solely 
by the variance. Implicit in this is the idea that the distribution of returns is essen- 
tially Gaussian. Second, if one thinks more generally in terms of maximizing expected 



Utility, Eqn. (1.4) only emerges if one either approximates, or considers exponential 
utility and again Gaussian returns. Third, the answers are extremely unstable with 
respect to changes in covariance or expected return. Fourth, the dependency model 
between returns is quite explicitly Gaussian, expressed by the covariance. Patholog- 
ical covariances with zero or near-zero eigenvalues cause an instability, and optimal 
configurations may have zero weights in some assets and not in others. It will be 
important to realize that interior-point solutions are quite rare in practical examples, 
and optima frequently sit on boundary planes and vertices. 

There are other complications - investors might need more complex constraint 
sets and/or allow for short-selling, where some weights are negative, especially for 
hedge fund applications. These are fairly easily admitted into classical computation 
techniques, but relaxing all the other restrictions is problematic. Recent work on 
Robust Optimization allows for some uncertainty in the parameters, for example, 
with a QP framework. See for example Goldfarb et al [7] and references therein, (55] 
for minimal eigenvalue management. 

Investor preferences might not be expressed in terms of variance-return struc- 
tures. Expected Utility is a common theme in more mathematical studies, but there 
are also specific investment functions that are written down without explicit reference 
to utility. Value at Risk and coherent variations are often considered, but remain con- 
troversial. The measures of VaR and CVaR have attracted a lot of attention following 
the pioneering work of Rockafellar and Uryasev [20]. This work, which I shall refer 
to collectively as the RU-method, opened up a whole new area of research into the 
optimization of full distributional measures of risk (i.e. not just working with simple 
moments). From our point of view a key feature is the modelling of the return distri- 
bution by Monte Carlo sampling of returns. This should be distinguished from what is 
being proposed now, in that we wish to take the additional step of simulating random 
portfolios. The RU-mcthod is ingenious in that it maps the VaR, CVAR optimization 
problem into an LP problem solvable by well-established methods. Nevertheless it 
has the awkward feature that the mapping generally involves the creation of one con- 
straint for each Monte Carlo realization of the portfolio, and therefore raises issues of 
computational scaling as one tries to model the return distribution more accurately, 
and especially when treating low-quantile tail risk. An alternative avoiding this issue 
has been proposed relatively recently by Iyengar and Ma [5]. 

But the choice of investor objective functions does not stop with VaR and CVaR. 
We will remark on other parts of the zocP] Investment performance has been measured 
by a variety of financially-motivated ideas. Expressions based on the Sharpe ratio 
abound. A mathematically neat characterization of one family has been given by 
Farinelli and Tibiletti [3], who define a one-sided performance ratio as a function of 
a benchmark b as follows: 

E\((X - b) + )P] 1 ^ p 
^»'>- EI«t-W/. (L5) 

The choice (p, q) = (1, 2) recovers the Sortino ratio and (p, q) — (1, 1) is the "Omega" 



3 The intention here is not to be comprehensive, but I apologise for leaving out functions felt by 
others to be important. I just want the reader to appreciate that there is a very large zoo indeed, to 
lure you into the idea that it would be helpful to have a single method that could do them all. 



4 



W.T.SHAW 



function [2], discussed in more detail later. Mathematicians on the other hand often 
prefer to work with expected utility based on a choice of utility function, and the 
behavioural finance idea can be represented through prospect functions [3], which 
this author regards as expressible as two-sided utility. 

All these approaches are functions of the full probability distribution in general 
and generally speaking one has to work hard on a case by case basis to do the opti- 
mization. Identification of a gradient function to unleash classical methods may be 
difficult, so a universal gradient-free approach may be very useful. 

1.1. Summary. In general we have some mapping from the distribution of port- 
folio returns to the real numbers that is a function of the weighs of the assets in the 
portfolio. Our goal may be stated as follows. For general choices of: 

1. asset-price evolution model or price probability distribution; 

2. inter-asset dependency model; 

3. portfolio constraints; 

4. investor objective function (including Quadratic Risk-Return, Pure down- 
sided risk measures, "Omega", "Sortino ratio", Utility, Behavioural Finance 
Prospect...); 

with the possibility of requiring robust results in the light of data uncertainty, to 
evaluate the optimal weights in a portfolio. 

The proposed solutionis as follows. The method is to find the optimum portfolio 
by a simple but extensive computational search amongst randomly generated portfo- 
lios, using a pre-created collection of portfolio return samples (as per the RU- method), 
but then 

• simulation of the possible weights in a careful manner; 

• direct calculation of the risk function for each weight; 

• selection of the weights by taking the best outcome; 

• exploiting parallel-processing technology to create a feasible system for large 
problems, by a trivial best of the best model, where different threads are 
fed the same return distribution but different seeds for the random portfolio 
generation. 

The method is summarized as Monte Carlo Portfolio Optimization in a Parallel En- 
vironment (MC-POPE). 

We wish to emphasize that the use of Monte Carlo methods for creating portfo- 
lio return realizations in connection with optimization is not new - this part of the 
concept was certainly explicit in the work of Rockafellar and Uryasev [20] on CVaR 
optimization. Also, the generation of random portfolios for assisting portfolio analysis 
in the context of diverse types of risk analysis is now available as a commercial tool 
- see the website for "PortfolioProbe" for example I9|. However, we believe that the 
combination of random portfolio sampling linked to risk optimization, with the means 
of random portfolio generation itself optimized for the risk optimization problem, is 
an approach rather in its infancy, and this paper is a contribution to that thread of 
thought. 

The parallel environments employed for our basic research are: 

• Grid Mathematica for idea development, testing, proof-of-concept and small- 
scale deployment. 

• GPU cards running CUDA or OpenCL for larger-scale problems. 

It will be evident that the plan involves existing and well-known mathematics 
apart from the notion of efficiently sampling random portfolios, and it is to this that 
we now turn, and devote the bulk of our analysis. 



4 It is at this point the reader may refer back to my quote from McEnroe. 
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2. The nature of optimal portfolios. Monte Carlo simulation has in the past 
largely focused on the calculation of an expectation, and the simulation is employed 
to generate a numerical realization of a measure. Here we are trying to locate optima 
so the importance of the measure changes somewhat and we have some latitude about 
what to do. But there are bad measures and good measures and the meaning of this 
is a matter linked to the particular nature of non-linear optimization. To get a grip 
on this we will introduce a simple test problem in QP form. Consider the case of zero 
expected return and the pure variance-minimization problem with N = 3 defined by 
the parametrized covariance given by 

/ 64 120r 25 \ 
Cij = 120r 225 50 (2.1) 
\ 25 50 100 / 

where — 1 < r < 1. The solution to this problem as a function of the correlation 
parameter r can be found analytically (given later) or with any number of off-the- 
shelf QP solvers and the results look like this, where we plot the 3 weights as a function 
of r: 




Fig. 2.1. Minimum variance weights 

The details of this example do not matter yet. What matters is that for some r 
the solution is an interior point where all weights are non-zero and when r is close 
to ±1 one of the weights is zero. This is a typical feature of QP solutions. While 
LP problems always have solutions on the boundary of the feasible region, but QP 
problems may be interior point but will frequently have solutions with several zeroes 
in the weights for particular assets. We will return to this presently. 

3. Random portfolios. We will start with a basic approach, based on the 
comments at www.excel-modeling.conj^J and while it is not what is ultimately needed, 
it provides a useful clue to improve matters. It should be noted that no criticism of the 
general integrity of the models on that web site is intended, and the author recognizes 
that their public web site might just contain a simple example for illustration only. I 
quote from the web site of excel-modeling.com: 1. Generate 3 random numbers. 2. 



5 The website as of August 2010 reports this approach. 
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Assign the ith random number divided by the sum of the three random numbers as the 
weight for stock i. The procedure above ensure each weight will be fairly distributed. 

This is an interesting idea. However, the statement at the end is in fact false, 
if the underlying random numbers are uniform on (0, 1 J] - we shall see later what 
distributions make this procedure more sensible. Such a procedure creates an uneven, 
biased distribution. It is easiest to see what goes wrong with a picture that shows the 
biased distribution of 4,000 samples of the simplex: 

Wi + U> 2 + W 3 = 1 , Wi > (3.1) 

The bias is evident from the scatter plot, that reveals that the distribution has 




Fig. 3.1. Simple biased simplicial sample 



a peak at the equally-weighted location, and subsidiary peaks in the middle of the 
edges. There is strong bias away from the vertices. This gets worse as you increase 
the dimension, as can be checked by some simple numerical experimentation. 

We will be unlikely to find correct optima buried on any vertices or edges using 
this method, which should be avoided in the view of this author. Note, however, that 
we wish to reserve the option to bias, but in a different way. I will return to this later. 

The next task is to properly characterize the methods for generating un-biased 
random portfolios with the appropriate constraints. I experimented with this for a 
while and came up with two methods, only to find the basic ideas were all properly 
written down over two decades ago. The theory for the simplest case was all written 
down by Luc Devroye in Chapter 5 of his beautiful book Non-uniform Random Variate 
Generation^ . The idea of so-called "simplicial sampling" is also at the heart of many 
copula sampling methodologies. I will explain the three equivalent methods directly, 
but the reader is urged to consult the Devroye text. 

3.1. Even simplicial sampling method 1. This is based on a simple gap 
distribution. To get k weights satisfying the constraints, generate k — 1 points in the 
unit interval, sort them in increasing order and then consider the gaps, including the 
end points at zero and one. Note that this requires a sort in order to define the gaps 
from left to right sequentially. In the case TV = 3 we sample two points U\,U2, let 
Vi = min(C/i, U2), V2 = max((7i, £/ 2 ) and just write down 

( Wl ,w 2 ,w 3 ) = (V 1 ,V 2 -V 1 ,1-V 2 ) . (3.2) 



6 The company quoted may well have more sophisticated algorithms or different underlying dis- 
tributions 
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The result is then an beautifully even distribution (shown with 2000 points): 




Fig. 3.2. Simple unbiased simplicial sample based on gaps 



3.2. Even simplicial sampling method 2. Can we avoid the sort? We use 
the same concept as before but think about drawing the points directly in order. The 
largest sample is drawn directly from the maximum distribution of k uniform samples 
on (0, 1). Then if this is the random variable XMaxi we draw the next biggest sample 
from the distribution of the maximum of k — 1 uniform samples on [O^mim], an d so 
on. The CDF of the p-maximum on the unit interval is easily seen to be x p ', so the 
quantile of the p-maximum on (0, y) is y * u^^K So we work our way down using the 
quantile to generate samples from other uniform variables. The gaps are then used as 
before and the same picture is obtained. The author's own experience of this method 
is that it is slower than the first one, but this seems to be a function of the relative 
inefficiency of working out many fractional powers compared to a simple sort. The 
maturity of sort routines seems to ensure that method 1 wins over method ^\ 

3.3. Even simplicial sampling method 3. This uses the idea that we take 

wt = /* , (3.3) 

but chooses the distribution of the Zi much more intelligently. The Zi are taken from 
a one-sided exponential distribution with density 

f(x) = \e- Xx (3.4) 

for any choice of positive A. An elementary computation with exponential and gamma 
distributions reveals that the CDF of e.g. w\ is given by 

F{x) = 1 - (1 - x) N ~ l (3.5) 

which is the CDF of the minimum of N — 1 points. This is just what is required 
when we compare with method 2. The simulation of one-sided exponentials is trivial 



7 This was tried in Mathematica V7 - it is possible that other balances may be found in different 
computing environments 
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as there is an explicit quantile function. Working it out the value of A drops out and 
we can lose some minus signs and simplify to 



u, 



(3.6) 



where the Ui are uncorrelated uniform random variables on (0,1). The view of the 
author is that this is the method of choice if you want to sample a simplex or subject 
of it evenly - the issue of whether such even sampling is optimal is a separate matter 
and we turn now to the testing of this. 

4. First test problems. In order to develop a basic understanding of the Monte 
Carlo portfolio sampling and its application we first apply it to some simple QP 
problems. This has the advantage that we can establish the basic feasibility of the 
approach and highlight any difficulties. In this case we do not need to couple to 
samples of the return distribution, as the risk measure is a trivial function of the 
simulated weights. This allows us to check basic issues, but will of course leave open 
the question of computational feasibility of the system when both the portfolios and 
the distributions are sampled simultaneously. 

We first consider again the 3-dimensional pure variance-minimization problem 
introduced by the covariance of Eqn. (2.1 1. The solution to the problem^] is given by 
the following formula for (u>i, W2), with W3 — 1 — w\ — W2 ■ 

'225-120r 1 225-120r\ 



V 289-240r ' 289-24GV > 

/ 5(48r-125) 9(40r-17) s 

V576r 2 +240r-1001 ' 576r 2 +240r-1001 > 

(0.657895,0) 



if r < -0.383782 
if - 0.383782 < r < 0.425 (4.1) 
if r > 0.425 



Some sample values are given in the following table. Now we write down some very 



r 


m 


W2 


w 3 


-0.5 


0.696822 


0.303178 


0.0 


0.0 


0.624376 


0.152847 


0.222777 


+0.5 


0.657895 


0.0 


0.342105 



Table 4.1 
Sample weights for N = 3 optimization 



simple code in Mathematica to solve the problem by the Monte Carlo method and the 
even simplicial sampling algorithm based on the exponential distribution. Here is the 
code: 

cov3[r_] := {{64, 120 r, 25}, {120 r, 225, 50}, {25, 50, 100}}; 
{cova, covb, cove} = {cov3[-l/2], cov3[0], cov3[l/2]}; 
testa [mat_] := 

AbsoluteTiming [rawa = Log [RandomReal [{0 , 1}, {10~5, 3}]]; 
triala = Map [#/Apply [Plus , #] &, rawa]; 
First[SortBy[Map[{#.mat.#, #} &, triala], First]]]; 
Map[testa[#] &, {cova, covb, cove}] 

and this returns the output: 

0.803038 {26.4108, {0.693656, 0.306265, 0.0000786956}} 
0.803102 {45.5295, {0.623704, 0.152807, 0.223489}} ) (4.2) 
0.808720 {50.6587, {0.656182, 0.000037375, 0.343781}} 



8 One can do this one analytically with Lagrange multipliers taking care of the boundary, and 
the same answers are obtained numerically by direct application of the FindMinimum function in 
Mathematica 7, or indeed by the use of M.J.D. Powell's TOLMIN algorithm (TH). 
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This suggests that the method is basically sound, taking 0.8 seconds on a 2.8GHz 
machine to produce acceptable answers. We will return to prompted questions such 
as the precision of the zero- weight solutions, but the reader may wish to satisfy them- 
selves that without the application of the log function the results are not as good 
with the same number of samples, and the computation typically still takes about 
0.79 seconds. 

4.1. Explicit Parallelizability. Even with such a simple test problem, we can 
exhibit the trivial way the solution may be parallelized. It is especially easy with Grid- 
Mathematica, where one merely needs to launch slave kernels, distribute the relevant 
definitions, and ask each slave to find its own optima, then take the best of the best. 
The code is a trivial extension of that used previously, and in the following example 
we exhibit the result from 8 million portfolios considered on an 8-core machine: 

LaunchKernels [] ; 
DistributeDef initions [cove] ; 

AbsoluteTiming [parallelruns = ParallelEvaluate [ 

raw = Log[RandomReal[{0, 1}, {1CT6, 3}]]; 

triala = Map [#/Apply [Plus , #] &, raw]; 

First [SortBy [Map [{#. cove. #, #} &, triala], First]]];]; 
First [Sort [parallelruns] ] 

which returns output, e.g. from one run of around 12 seconds, of 

{50.6582, {0.656187, 8.4297 x 10~ 7 , 0.343812}} (4.3) 

We will be able to do rather better still, as we shall see later - the point here is that 
best-of-the-best parallclization is easy. 

4.2. General non-linear constraint management by rejection. In prac- 
tice the constraint sets are more complicated. If we stay within the long-only fully 
invested framework but impose further inequality conditions the simulation remains 
very straightforward. We simply reject those random samples not satisfying the con- 
straints. Constraints might arise for a variety of reasons. We might want a portfolio 
"beta" to lie between certain limits, in order to constrain the portfolio compared to 
index behaviour. We also might wish to impose lower or upper bound conditions on 
individual weights, or have the total weight in a given market sector constrained to be 
above or below a certain level. Such inequality conditions are well suited to rejection. 
Equality constraints either have to be relaxed to a tight pair of inequalities to ap- 
ply this method, or treated by a different method altogether. Rather than demanding 
{3 = 1, for example, we might reject configurations unless they satisfy 0.95 < /3 < 1.05. 

With our working example, suppose we demand that in addition 

wi > i , w 2 + l.liu 3 > \ ■ (4-4) 

To treat this we just throw out the samples that do not satisfy these conditions and 
optimize over those that do. Note that there is no requirement that the constraints be 
linear. In Mathematica the rejection may be achieved by a trivial application of the 
Select command. Here we do a single-core example 

AbsoluteTiming [rawa = Log [RandomReal [{0, 1}, {10~5, 3}]]; 
triala = Map [#/Apply [Plus , #] &, rawa]; 
constrained = 

Select [triala, (#[[1]] >= 1/3 && # [ [2] ] + 1 . 1 # [ [3] ] > 1/2) &] ; 
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First [SortBy [Map [{#. cova.#, #} &, constrained], First]]] 

which returns the outcome 

{0.853075, {32.2394, {0.518711, 0.293995, 0.187294}}} 



(4.5) 



We note that the calculation takes marginally longer, and compares favourably with 
the direct QP optimization: 

QPweightsB [r_] := FindMinimum[{{x, y, z} . cov3 [r] . {x, y, z} , 
x>=0, y>=0, z>=0, x+y+z==l, 

x > 1/3, y + 1.1 z > 1/2}, {x, y, z}] ; 

QPweightsB [-0.5] 



{32.2379, {x -> 0.518873, y -> 0.292396, z -> 0.188731}} 



(4.6) 



The imposition of a complex set of constraints is clearly straightforward, at the price of 
discarding rejected samples. Note further that in this last example we have exhibited 
the value of the optimal objective function from both the QP solution and the Monte 
Carlo case, and they agree to four significant figures. This is an illustration of the 
notion that considering the best of a large number of simulated portfolios may be 
"good enough" to get close to optimal risk. While we will see shortly that we can do 
better still on the precision of the answer, it is defensible from the point of view of 
providing investment advice to argue that one has considered a very large number of 
cases and is presenting the best result from that large set. 

4.3. More realistic problems: a 6D example. We now increase the dimen- 
sion to N = 6 and attempt to break the method by simulating many covariance 
matrices and exploring the outcome. The outcome of these studies will be condensed 
to exhibiting a nicely pathological example that shows we need to work harder still 
to make this approach viable. We will study the following positive definite covariance 
matrix: 



V 



0.0549686 
0.144599 

-0.188442 

0.0846818 
0.21354 

0.0815392 



0.144599 
1.00269 

-0.837786 
0.188534 
0.23907 

-0.376582 



-0.188442 
-0.837786 
1.65445 
0.404402 
0.34708 
-0.350142 



0.0846818 
0.188534 
0.404402 
0.709815 
1.13685 
-0.177787 



0.21354 
0.23907 
0.34708 
1.13685 
2.13408 
0.166434 



0.0815392 
-0.376582 
-0.350142 
-0.177787 
0.166434 
0.890896 



/ 

(4-7) 

This matrix has been selected from a large number of randomly-generated oneaj The 
optimum weights are given by 



(0.883333,0,0.11667,0,0,0) 



(4.8) 



and we note that four weights are zero and most of the portfolio is concentrated in 
one of the two remaining assets. 

The even simplicial sampling scheme introduced so far struggles to capture this 
optimum. For example, in an 8 x 10 5 parallel run on an 8-core machine, an example 
of the output is 



{0.710654, 0.0596266, 0.156866, 0.00340821, 0.00300641, 0.0664388} , 



(4.9) 



9 In fact we simulate Cholesky decompositions and then square up to recover valid covariance 
structures - the issue here is not the possibility of zero or negative eigenvalues that may arise through 
trader interference, but edge-vertex issues. 
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and with 8 x 10 6 samples we only arrive at 

{0.82345, 0.0141151, 0.125324, 0.000214034, 0.00184072, 0.0350566} . (4.10) 

This raises a new and interesting question over the three even-sampling schemes. 
They manifestly sample the simplex in an even fashion, but this is actually not always 
what is needed. If you look at the marginal distribution of any one of the weights, 
for example, in the gap approach this is the distribution of the minimum, and it is 
strongly biased towards zero (the bias towards zero increases as N increases). This 
means that it is very hard to find configurations where one or two weights have a high 
value. A related and more subtle problem is that although we are probing the interior 
in an even manner, the optimum above has four weights that are essentially zero, so 
that we need to think harder about configurations on a boundary, and to do so in a 
way that will not compromise finding interior optima that are, e.g., close to 1/N. 

5. Changes of measure: Edge- Vertex Biased schemes. The methods for 
evenly-sampling the interior do not readily capture the edge and vertex solutions, 
as has been made clear by the N = 6 test. We need some method of changing the 
measure so as to also capture solutions along the edges and vertices. In approaching 
this one can consider a theoretical approach, e.g. trying to optimize the capture of 
both interior and boundary points, among a class of methods such as 

Wi = — . (5.1) 

£f=i/(fi) 

One might also consider separate sampling of faces, edges and vertices at the price of 
some combinatorial complexity. 

However, it seems that a very effective solution can be obtained by combining a 
little insight from the theory of norms with some numerical experimentation. If we 
consider the L q norm on a vector Vi in the form 




ii^ii 9 =(>^;"1 • (■'■•->> 

we know that as q — > oo then 

||V||,-»-maxVj . (5.3) 

i 

In the current context we can adapt this idea by observing that, except in the case of 
two or more Ui being identical, the quantity 

will approach, as q — > oo a vector of all zeroes except for a one in the slot corresponding 
to the largest Ui. Intermediate values of q will have intermediate concentrations of 
entries in the higher entries. In order to keep matters computationally trivial, we will 
consider q — 2 P for p = 0, 1, 2 . . . , so that the bias evolves by elementary squaring. In 



Figure ( 5.1 1 we consider p = 0, 1, 2, 3, 4, 5 so that we go to to q = 32 based on 2000 



initial points. 

6. Second test phase. We now return to the testing of the modified scheme. 
We can re-investigate both our N = 3 and N — 6 example without or with any 
parallelization. We will focus attention on the pathological N = 6 case, as it is the 
location of boundary optima which concerns us. The idea is to 
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• create k samples Ui m , I — 1,...N m — l,...,k from the iV-dimensional 
hypercube, 

• form all their iterated squares Uf m up to some power (e.g. 32 = 2 5 ), 

• form all the simplex samples 

r/ 2P 

Wl,mp - N , 

• Minimize the risk function over all m and p. 

A sample calculation was done where we took k = 20, 000 on each of 8 processors, with 
iterated squaring to 32, so that 8 x 20,000 x 6 = 960,000 portfolios were considered 
in a 2sec computation, with the weights outcome 

{0.88333, 2.0E-14, 0.116658, 3.9E-8, 7E-43, 0.00001166} (6.1) 

So the problem has been solved. Fig. |5.1| makes it clear that both interior and bound- 
ary points may be found with equal ease by considering the union of all the samples 
in the six scatterplots. 
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7. Full Distributional Risk Measures. Having established the feasibility of 
an optimization based on edge- vertex-biased sampling, for directly computable func- 
tions of the weights we now generalize to the case where the multivariate return 
distribution is itself simulated. What we need to do is essentially the same as before, 
except that rather than working out simple functions of the weights such as w.C.w, 
we have to consider functions of the form 

Risk(w, samples, params) (7-1) 

where the Risk function has inputs that are weights, the entire distribution realized 
in terms of samples, and auxiliary parameters such as 

• Lagrange multipliers for QP problems; 

• expected returns; 

• quantile levels for VaR, CVaR applications; 

• p,q,b values for Omega, Sortino; 

• parameters of utility or prospect functions; 

• .... 

In principle we just compute the samples in advance and send them into the compu- 
tation as a big extra matrix parameter, distributing them to all the cores in the case 
of a parallel computation. 

7.1. Consistency, stability and sample size. In moving to a characteriza- 
tion of risk through a sampled distribution we must tackle an important issue that 
arises whenever a multivariate system is realized through samplinj^j The volatility 
and dependency structure may be characterized in various ways, but for the sake 
of explaining this idea we will assume that a covariance has been defined from some 
dataj^J We call this the input covariance. We now sample the distribution using these 
parameters and then recompute the realized covariance. The realized covariance will 
not be the same as the input covariance, though we expect that the two will get closer 
in some sense as the number of samples increases. 

Given that, e.g. in the case of QP, the computed optima may jump around as a 
function of the dependency data supplied, this needs careful management. I do not 
have a solution to this beyond suggesting that the number of return samples employed 
should be as large as is practical, and that the stability of the optimal weights as the 
number of return samples is increased should be considered. In general up to five 
distinct computations might be possible, and all five are possible in the case of a QP 
problem, which gives scope for working out "how much is enough" : 

1. Analytic QP on the input covariance; 

2. Analytic QP on the realized covariance; 

3. Monte Carlo Edge-vertex biased sampling of the simple analytic function (as 
above) with input covariance; 

4. Monte Carlo Edge-vertex biased sampling of the simple analytic function (as 
above) with realized covariance; 

5. Monte Carlo Edge- vertex biased sampling with full distributional return sam- 
pling, implicitly with realized covariance. 

This might seem somewhat burdensome, but we are just looking for reasonable 
agreement between computations 1 and 3, and/or 2 and 4 in order to be sure that we 
have a sufficient number of random portfolios, and then agreement between 1 and 2, 4 
and 5 to know we have enough samples of the return distribution. While some analytic 
estimates of when this works are desirable, I will continue in the rather pragmatic 



1 Similar issues arise with basket option pricing. 

11 It could equally well be the parameter of a chosen Archimedean copula together with some 
variances. 
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experimental spirit of this paper in trying to find out what works. I will continue with 
the pathological N — 6 example where we have done the work of comparing 1 and 
3, so from here on it is all about the number of samples in the return distribution. 
Some example results for calculation 1 vs 2 are given in the following table, using 
Mathematica's FindMinimum function. The norm function A is given by 

A = Maxj ; j|(7jj (realized) — Cij(input)\ (7.2) 

and the zero weights are omitted, all being small in the examples computed. 



k 


A 




w 3 


100 


0.2721 


0.877 


0.123 


1000 


0.077 


0.880357 


0.116333 


10000 


0.01976 


0.883597 


0.116403 


100000 


0.10102 


0.883168 


0.116832 


input 





0.883333 


0.11667 



Table 7.1 

Covariance and weights stability as function of return sample size, exact QP 

Of course, this is all Monte Carlo, so recomputation causes this table to change, 
but our general experience is that 10, 000 samples is a minimum safe number. Going 
beyond this helps the solution precision, and is of course necessary if one wishes to 
probe VaR or CVaR measures deep into the tail. 

We now jump straight to calculation type 5, where both the portfolios and the 
return distribution are simulated. This is our first example of "The Full Monte" . The 
computation of the return distribution samples is done in advance, and results in a 
10000 x 6 matrix given the symbol mixed. The calculation is shown as a Mathematica 
session using some prototype packages and built-in simulation tools to do the work. 
First we define a multivariate uncorrelated normal sampler, a random Cholesky matrix 
simulator, seed the random number generator to reproduce the pathological case under 
consideration and then create the correlated samples. 

rawnormal [n_ , dim_] := RandomReal[NormalDistribution[0, 1], {n, dim}] 
SeedRandom [20] 
ranchol [n_] : = 

Table [If [i > j, 0, RandomReal [{-1 , 1}]], {i, 1, n>, {j , 1, n}] 
six = ranchol [6] ; 

rawb = RandomReal [NormalDistribution[0, 1], {10000, 6}]; 
mixed = Map[#.six & , rawb]; 

At this stage we have 10000 6-vectors with the realized covariance. Our package 
consists of a suite of prototype risk functions. A load and query command pair 

Needs ["PMCOptimizer 'RiskFunctions 1 "] 
?PMC0pt imizer ' RiskFunctions 1 * 

outputs a list of functions containing objects like ConditionalValueAtRisk, Omega, 
ValueAtRisk, MeanVariance , NegativeSharpeRatio and so on. A second compo- 
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nent of the package contains the optimizers developed so far based on this approach, 
which at this stage has a developed form of the long-only problem optimizer with no 
auxiliary constraints (the addition of further constraints is not difficult and is being 
functionalized) . 

Needs ["PMCOptimizer 1 Optimizers ' "] 
?PMC0pt imizer ' Opt imizer s ' * 

returns the information: LongOnlyPMCOptimizerfsamples, numberof assets, basesearch- 
size, evbiasdepth, riskfunction, riskfunctionparameters] minimizes the supplied risk 
function with given riskfunctionparameters, based on an input of terminal multi-asset 
samples for numberof as sets assets, using a basesearchsize and a defined iterative vale 
of evbiasdepth. 

E.g. of use LongOnlyPMCOptimizer[mixed, 6, 1000 5, MeanVariance, 0.0] 
Does classic mean-variance optimization with zero return tilt for a sample defined 

by the N by 6 matrix mixed, with 1000 basic simplex samples iteratively biased five 

times. The method is parallelized. 

So now we just have to run it using the command (the seeding is to produce 

reproducible random results from a bunch of slave kernels - in live applications one 

would probably do this to avoid "turnover" generated by Monte Carlo noise.) 

ParallelEvaluate [SeedRandom[$KernelID*100] ] ; 
LongOnlyPMCOptimizer [mixed, 6, 1000, 6, MeanVariance, 0.0] 

This returns the result 

{9.983915, {0.0274786, (7.3) 
{0.882869, 1.22605 * 10~ 14 , 0.117122, 9.9529 * 10~ 7 , 1.2674 * 10~ 15 , 8.05487 * 10~ 6 }}} 

which is a 10 second computation on my deskside 2008 Mac Pro 8-core. 

The author is happy to stipulate that this is possibly the least efficient QP 
algorithm yet written down. The upside is that now we can replace the function 
MeanVariance with any function of the distribution of portfolio returns and the speed 
of the method will scale easily inversely with the number of cores deployed, provided 
that each core can store the return distribution and do the basic computations and 
sorting. Furthermore the precomputation of the distribution needs only to be done 
once and can be based on any multivariate structure, empirical data, or the result 
of evolving some complicated stochastic model. This latter feature is of course in 
common with the RU-method [20j but the ability to swap in any risk function is a 
novel one, and the size of the return distribution can be increased provided each core 
has sufficient memory and the sort time does not become prohibitive. 

7.2. Example: VaR/CVaR/Sharpe Ratio/Omega optimization. To il- 
lustrate the flexibility of this method, we now replace the MeanVariance function 
with other functions of the full distribution, again with controlled seeding. This is 
a somewhat vacuous test however, as for multivariate zero-mean Gaussian we would 
expect the same optimal weights to be obtained for corresponding calculations, pro- 
vided enough samples are taken. For example, both the VaR and CVaR are obviously 
minimized by making the portfolio distribution as narrow as possible, which would 
reduce to pure variance minimization. So the similarity in the following results for 
multivariate Gaussian is in fact a useful check that we are doing the right computa- 
tions. 
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In [45] := ParallelEvaluate [SeedRandom [$KernelID*100] ] ; 

In [46] := LongOnlyPMCOptimizer [mixed, 6, 10~3, 6, ValueAtRisk, 0.025] 

Out [46]= {12.210430, {0.327577, {0.871788, 4.58122*10~-30, 0.128205, 
1.40685*10~-6, 5.2755*10~-6, 3 . 54997*10~-8}}} 

In [49] := ParallelEvaluate [SeedRandom [$KernelID* 100] ] ; 
In [50] := LongOnlyPMCOptimizer [mixed, 6, 
10~3, 6, ConditionalValueAtRisk, 0.025] 

Out [50]= {12.627039, {0.393196, {0.882869, 1 . 22605*10~-14, 0.117122, 
9.9529*10~-7, 1.2674*10"-15, 8 . 05487*10~-6}}} 

In [51] := ParallelEvaluate [SeedRandom [$KernelID* 100]] ; 

In [52] := LongOnlyPMCOptimizer [mixed, 6, 10~3, 6, MeanVariance , 0.0] 

Out [52]= {9.920736, {0.0274786, {0.882869, 1 . 22605*10~-14, 0.117122, 
9.9529*10~-7, 1. 2674*10^-15, 8 . 05487*10~-6}}} 

In [53] := ParallelEvaluate [SeedRandom [$KernelID* 100] ] ; 
In [54] := LongOnlyPMCOptimizer [mixed, 6, 
10~3, 6, NegativeSharpeRatio, -1.0] 

Out [54]= {9.944519, {-6.01598, {0.882869, 1 . 22605*10~-14, 0.117122, 
9.9529*10~-7, 1.2674*10~-15, 8 . 05487*10~-6}}} 

In [55] := ParallelEvaluate [SeedRandom [$KernelID* 100] ] ; 

In[56] := LongOnlyPMCOptimizer [mixed, 6, 10~3, 6, NegativeOmega, -0.5] 

Out [56]= {64.089293, {-4169.27, {0.889907, 1 . 08026*10~-6 , 0.109982, 
6.87834*10~-9, 3 . 13299*10~-17 , 0.000109915}}} 

Changing the nature of the input distribution mixed clearly will lead to a richer set 
of results, but the results above may also be regraded as a testament, if any were 
needed, to the substance of Markowitz's original insight. 

7.3. Example: CVaR optimization vs Rockafellar-Uryasev. Considerable 
interest has been stimulated in CVaR optimization in part due to its coherence proper- 
ties but mainly due to the development of an ingenious solution technique in |20) . We 
can use this as a further comparison check. Rockafeller and Uryasev considered, as a 
test problem, the optimization of a portfolio consisting of three assets with covariance 
given by 

/ 0.00324625 0.00022983 0.00420395 \ 
dj = 0.00022983 0.00049937 0.00019247 (7.4) 
\ 0.00420395 0.00019247 0.00764097 / 

and expected returns given by 

Ri = {0.010111, 0.0043532, 0.0137058} (7.5) 

They analyzed the problem in various forms, but here we will focus on the CVaR 
optimization results presented in Table 5 of [20 . This is an interesting comparison 
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as the returns, as here, are represented by direct simulation, so we can do a reason- 
able like-with-like comparison between their LP approach and the proposed random 
portfolio method. We must of course bear in mind the 10 year difference in computer 
speed. The were working on a 300 MHz machine whereas the results here are at 
2.8GHz, with optional multi-core technology. 

This part of the work of |20j , in common with our own study, makes a comparison 
with traditional QP based on Gaussian statistics, based on the fact that the optimal 
CVaR, if done correctly, should coincide with the simple minimum variance portfo- 
lio. Rockafellar and Uryasev find the optimal weights under the minimum variance 
approach to be 



with the resulting minimum variance of 0.00378529 and the CVaR results being 



and the three considered quantiles of 0.1,0.05,0.01 respectively. A quick application 
of FindMinimum in Mathematica produced an identical optimal variance and optimal 
weights agreeing to 5 sig. figs. 

For the direct Monte Carlo CVar optimization the parameters employed are: J = 
1000, 3000, 5000, 10, 000, 20, 000, 100, 000 for the return distribution size. Our previous 
experiments suggests that the results to take seriously are those with J > 10, 000 and 
a good number of sample portfolios. We consider base portfolio sample sizes based 
on our early verification, so we have 20000 base portfolios iterated to the 32nd power, 
on each of 8 kernels, so that 960,000 random portfolios are considered. In the light 
of the results being close to equal weight, we could consider doing a recomputation 
with less EV-bias and more base portfolios, but would prefer to leave this out here - 
it might appear to be using some foresight, and in any case the matter of intelligent 
refinement is a subject of current research. The quantiles considered are, as with the 
RU paper, 0.1,0.05,0.01. 

Inspection of the tables reveals some interesting observations. It is clear that 
there is significant instability in the results for J < 10, 000, confirming our earlier 
circumspection on having a good-sized portfolio return sample. For deep tail CVaR 
it is also clear, as might have been expected, that for the 0.01 quantile rather more 
samples are needed. The timings become awkward as we let J push through 100, 000, 
but it must also be borne in mind that not only is this a prototype code using largely 
interpreted Mathematica, we are only just in the early years of multi-core computation 
and the times can be slashed by parallel deployment. This will be necessary, as is 
made clear by the one case of J = 200, 000 added for the 0.01 quantile case. 

As in the RU-study, these results can be recomputed with a variety of different 
simulation methodologies. RU recomputed with quasi-random sobol sequences gen- 
erating the return samples. We could of course apply this notion to both our return 
distribution, and, given that the portfolio samples also originate on a hypercube, these 
might be similarly generated, though the interaction with the parallelization would 
need thought - the pure Monte Carlo approach parallelizes rather trivially provided 
each core or kernel can store the return distribution. 

Of course, we can recompute examples like this with any suitable risk function, 
which is the payback for the increased computation associated with the random port- 
folio method. 



{0.452013, 0.115573, 0.432414} 



(7.6) 



{0.096975, 0.115908, 0.152977} 



(7.7) 
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0.0540555 


0.333967 


0.117257 


+0.001349 


5.5 


0.05 


3000 


0.383387 


0.141948 


0.474665 


0.11562 


-0.00028 


18.5 


0.05 


5000 


0.451571 


0.115731 


0.432698 


0.11573 


-0.000178 


35.0 


0.05 
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0.462245 


0.111638 


0.426117 


0.114866 
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71.6 


0.05 


20000 


0.477872 


0.105622 
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QP 
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0.115908 
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0.575114 


0.0681314 


0.356755 


0.146881 


-0.006096 


5.5 


0.01 


3000 


0.5432 


0.0804313 


0.376368 


0.150387 


-0.00259 


18.2 


0.01 


5000 


0.306275 


0.171566 
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0.15732 


+0.004343 


35.3 


0.01 
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0.492627 
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0.153361 


+0.000384 


69.9 


0.01 


20000 
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0.0958251 


0.400868 
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-0.000856 


142.2 


0.01 
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0.131996 


0.458818 


0.152366 


-0.000611 


187.9 


0.01 


200000 


0.46145 


0.11193 


0.42662 


0.152727 


-0.00022 


1809.8 


0.01 


QP 


0.452013 


0.115573 


0.432414 


0.152977 





NA 



Table 7.2 



Recomputation of Rockafellar- Uryasev CVaR example, 960, 000 Random Portfolios 



7.4. Omega optimization. The Omega function was introduced by Cascon, 
Keating and Shadwick (CKS)[2]. While introduced independently, with hindsight it 
may be viewed as a special case of the variability ratios introduced earlier. If we 
consider 

^-"m-W^M (7 - 8) 

we obtain an object which is superficially similar to the price of a Call option struck 
at b divided by a corresponding Put. This is a slightly misleading viewpoint as the 
expectations have to be taken in the real-world measure. Writing out the expectations 
explicitly in terms of a pdf f(x) and integrating by parts leads to the form originally 
written down by CKS: 

_ Q -*■<*)) (7 . 9) 

where F(x) is the CDF (cumulative distribution function) with F'(x) = f(x). For 
the financial motivation see [2]. 

These is an interesting objective function, and its optimization is consistent with 
some traditional approaches in a Gaussian environment. Let X have a distribution 
that is Gaussian with mean /i and standard deviation a. Let the standard normal 
density function be 4>{x) and CDF Q(x), with $'(x) = <j>(x). Then some algebra and 
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calculus gives us 



where 



v ; (f>(z) + z$(z) v ; 



b — u . 
z = ^ (7.11) 



We can check that the RHS of Eqn. ( 7.10 ) is a strictly decreasing function of z. Hence 
the maximization of £1 in a Gaussian model is simply the minimization of (b — p) /a 
for the portfolio. This is just the minimization over allowed choices of weight, of the 
negative of a Sharpe ratio with benchmark the threshold b instead of the risk-free rate. 
In general, with other distributions with possible non-trivial skewness and kurtosis, 
we get something else, and its optimization in general is less straightforward. 

The optimization of f2 over possible portfolio weights has been considered by a 
few writers in the last decade. Kane et al |10j considered an optimization based on 
empirical data for three assets. One approach they took was a Nelder-Mead algorithm, 
well adapted to problems such as this where the gradient is not readily specified, and 
other methods were used for higher-dimensional problems. Unfortunately, not all the 
properties of the input data are explicitly specified in this work so we are not able 
to do a detailed cross-check to see if we recover the same outputs. Mausser et al [16] 
describe an elegant transformation of the optimization to a problem in linear program- 
ming, but only summarize the moment statistics of the marginals, making a detailed 
benchmarking exercise against their results more complicated. Passow [T7] gives an 
elegant analytical treatment in terms of Johnson distributions for the portfolio, but 
again is not fully explicit about the nature of the dependency structure^ A more 
explicit example has been constructed by Avouyi-Dovi, Morin and Neto (ADMN for 
short) pp, who characterize their model completely in terms of a T-copula dependency 
model and GARCH evolution on the marginals. 

In view of the complications on explicitness or model complexity in published 
worlip^| we will propose a simpler benchmark for comparing fl optimizers, based on a 
simplification of the model in [1 . This will hopefully be much more accessible to others 
for cross-testing with no ambiguities about parameters and no modelling threshold 
other than having a candidate il-optimizer. We aim to revisit the full analysis of [T] 
in later work. 

7.5. Simplified ADMN Omega optimization model. In the ADMN model 
a portfolio of three indices (A, B, C) = (Germany, UK, US) is considered. In our 
simplified model the problem is described by the correlation matrix (the unconditional 
p from Table C.l of 0) 



1.00000000 0.47105463 0.35635569 \ 

0.47105463 1.00000000 0.44091699 (7.12) 

0.35635569 0.44091699 1.00000000 / 



12 Johnson distributions are a flexible method of modelling the non-Gaussian aspects of financial 
returns, but in the view of this author at least, the common approach of calibrating them using 
skewness and kurtosis is problematic in the light of recent work exposing the heavily fat tailed 
nature of asset returns. See e.g. [5] [6] for evidence of quartic and cubic power law decay for asset 
returns, rendering kurtosis (and possibly skewness in the light of [6]) calibration highly unstable. 
This is a separate matter to the issue of Omega optimization, but needs to be noted. 

13 The author would be grateful to hear of a published clear example with all aspects of marginals 
and dependency made explicit in a simple model. 
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In the ADMN paper a clear statement of the degrees of freedom of the T-copula is 
not, so far as we can tell, given, but the noise structure of the marginals appears to 
have a d.o.f. around v = 9, so our simplified model will take a dependency structure 
given by a standard multivariate T itself with a single value for the degrees of freedom 
(in full generality we might expect to have a copula d.o.f., and if the marginals were 
also T, three marginal d.o.fs.). That leaves us specifying asset vols and means. For 
the means we will take the /i parameter from Table C.2 of [I]: 

IM = {0.18963989, 0.16829560, 0.2788619} (7.13) 

and for the standard deviations we will take the values also from C2. 



<j t = {2.3251341,2.0430214, 1.8134084} 



(7.14) 



These values are given in pQ as being those pertinent to the holding period, so hope- 
fully constitute relevant parameters for when the GARCH details are washed out in 
this simplified form. 

The VaR parameters of the marginals were given in [1] and we can compare the 
values cited there in Table C2 with those from our simplified T 9 multivariate model. 
These can be worked out from the formula, for general d.o.f. n, for the signed VaR 
(negative numbers denote a loss) 



VaRi(u) = m + aid ^ — 



(7.15) 



where F v (u) is the inverse CDF or quantile function for a Student T distribution 
with v d.o.f. This was written down by Shaw |21) as 



F n ( u ) = sgn u 









n 







\ \ If[«<i,2«,2(l-«)] V2> 2) 



(7.16) 



and the further correction of y ^" ' n 2 ^ is due to a standard T distribution having 
variance n/(n — 2). 



u 


Asset A 
Var(u) 


Asset B 
Var(u) 


Asset C 
Var(u) 


0.05 
0.01 


-3.5693 
-5.59593 


-3.13456 
-4.9153 


-2.65279 
-4.2334 



Table 7.3 

VaR values for simplified Omega model, Tg distribution 



The MC-POPE model is run for J — 50, 000 return realizations for various values 
of the threshold. The computation of Omega is slightly more demanding so that 
number of random portfolios sampled was taken at a lower figure of 2000 x 6 x 8 = 
96000, realized as 2000 samples on each of 8 processors with 6 different biasing values. 
The results are as shown in Table [731 

The general levels of the optimal Omega and corresponding weights found from 
MC-POPE exhibit similar trends to those in p], despite the significant simplification 
of this model compared to that one, as can be verified by comparing Table |7.5| here 
to the relevant components of Tables Gl and G2 in [T]. Note that pQ does not 
report results for s > 0. There is nothing about the Monte Carlo approach requiring 



Monte Carlo Portfolio Optimization for General Objectives and Distributions 



21 



Threshold 


Omega 


Asset A 


Asset B 


Asset C 


b 


optimal 


Wi 


w 2 


w 3 


-4 


662.7 


0.22 


0.26 


0.52 


-3 


180.0 


0.20 


0.25 


0.55 


-2 


37.4 


0.19 


0.26 


0.55 


-1 


7.9 


0.19 


0.23 


0.58 





1.5 


0.07 


0.04 


0.89 


+ 1 


0.4 


1 


0.0 


0.0 


+2 


0.13 


1 


0.0 


0.0 



Table 7.4 

Omega-optimal weights from MC-POPE and simplified ADMN model 



constraints on b, provided the sample extends widely enough to capture more extreme 
values. As reported in |16j . the transformation to an LP problem in fact only works 
when 57 > 1, so there is something essentially different about the two last cases, which 
we now consider. 

7.6. The range of sensible thresholds. Let us return to the definition as a 
ratio of the payoffs of (real- world-measure) call and put: 

m E[(X-b)+] 
[) ~El(b-X)+} [r<) 

We can rewrite this as 

or ,s E[(X-b)+]-E[(b-X)+]+E[(b-X)+] 

n{b) E[(b-X)+] (7 ' 18) 

Now using the same kind of argument as is developed for Put-Call parity in option- 
pricing, the first two terms in the numerator may be rewritten, leaving us with 

E[(X-b)]+E[(b-Xr} _ Mx-j 
W ~ E[(b-X)+] ~E[(b-X)+} + U 9j 

This form of the Q function has been known for some time, see e.g. the presentation 
by Keating [TT]. It is now clear that 

• > 1 if and only if fix > b\ 

• 17 < 1 if and only if fi x < b; 

• Optimization of is only sensible if b is no greater than the maximum return 
of any asset in the portfolio. 

No feasible portfolio can have a return greater than the maximum return of any single 
asset, so inserting a threshold that is too high (while it is computable and indeed 
optimizable) is, in our view, financial nonsense. In practice it appears that the MC- 
POPE optimizer then still works, but picks the configuration with the largest downside 
volatility in order to maximize the denominator in Eqn. ( |7.19 1 . In our example there 



is an unstable transition from an optimal weight set of {0, 0, 1} when b is juts below 
the threshold to {1,0,0} when it is just above. The precise switch point can vary 
due to the fact that the fix is the mean of the realized distribution. In the case of 
Gaussian or Student-T models this can easily be fixed by using antithetic variables 
to se the realized mean to precisely the input mean, but this route may not be so 
clear for other distribution models. In reality it is better practice to keep b clearly 
below the maximum return on any component asset and run the optimization for a 
few values in this region to ensure one has not switched to a bad state. In practice 
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one would not wish to go near the configuration with nearly everything in one asset, 
unless one is a "Twain-extremist" . In a QP context trying to constrain the return to 
be greater than the maximum return on an asset would result in infeasability. In the 
case of Omega, while this function may be worked out for any threshold, trying to do 
an optimization for b > Max\p,i] is possible but financially irrational. 

8. Conclusion. We have developed a method of simplicial sampling, known 
as sequential edge-vertex biasing, which makes it possible to combine Monte Carlo 
simulation of both portfolios and returns to produce a highly flexible approximate 
optimizer capable of optimizing a diverse collection of risk functions. The input 
distribution may be varied at will and the computation parallelizes easily. We call 
this "MC-POPE". 

The method makes good use of parallel computing technology. The results here 
have been computed on a deskside 2 x 4-core machine from 2008, so that in late 2010 
a factor of two is already available on single machines, and 2-3 orders of magnitude 
speed improvement are possible with a cluster now, quite apart from any internal 
improvements that may be possible and the growth in cores per chip. 

Future possible developments include 

• Deterministic hypercube sampling; 

• Refinement; 

• Short-selling management; 

• Robustification based on sampling uncertain parameters. 

The method is slower than traditional QP techniques when applied to traditional 
problems, but the method's generality and scalability with multi-core computing tech- 
nology makes it an attractive option for future asset-allocation work. 

A further intriguing possibility is that of massively accelerating these computa- 
tions by the use of GPGPU boards running CUDA or OpenCL. The best way of 
exploiting the architecture of e.g. Quadro and GTX graphics cards to this end is 
under investigation. 

While most of the examples presented here have been multivariate Gaussian, this 
is solely for the purposes of testing the new methodology against known test problems 
using the QP philosophy coupled to the distributions being characterized only by their 
mean and variance. In reality other marginal distributions and dependency structures 
may be employed, preferably based on sound data analysis. For example, Fergusson 
and Platen [S] argued that the MLE estimator of daily world index log-returns was 
a Student t distribution with four degrees of freedom - such marginals are easily 
simulated e.g. using the methods of [21] . Various studies have captured evidence for 
Student, Variance Gamma, NIG distributions, and these can be simulated either in 
their multivariate forms, where available, or where the dependency is captured by 
a suitable copula. It will be clear that this variation represents a precomputation 
whose results are fed to the MC-POPE optimizer in just the same way as Gaussian 
simulations. 

The author acknowledges that this method represents almost pure brute force, 
but with an effectiveness sharpened by the method of edge- vertex biasing to treat this 
particular class of investment problems. The fact that it makes particularly good use 
of the current direction of evolution of computer technology is a bonus that means that 
it is capable of industrial-strength deployment, with a clear route to improvements in 
precision and/or speed. 
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